Informing the Physical Activity Evaluation Framework: A Scoping Review of Reviews

Objective Robust program evaluations can identify effective promotion strategies. This scoping review aimed to analyze review articles (including systematic reviews, meta-analysis, meta-synthesis, scoping review, narrative review, rapid review, critical review, and integrative reviews) to systematically map and describe physical activity program evaluations published between January 2014 and July 2020 to summarize key characteristics of the published literature and suggest opportunities to strengthen current evaluations. Data Source We conducted a systematic search of the following databases: Medline, Scopus, Sportdiscus, Eric, PsycInfo, and CINAHL. Inclusion/Exclusion Criteria Abstracts were screened for inclusion based on the following criteria: review article, English language, human subjects, primary prevention focus, physical activity evaluation, and evaluations conducted in North America. Extraction Our initial search yielded 3193 articles; 211 review articles met the inclusion criteria. Synthesis We describe review characteristics, evaluation measures, and “good practice characteristics” to inform evaluation strategies. Results Many reviews (72%) did not assess or describe the use of an evaluation framework or theory in the primary articles that they reviewed. Among those that did, there was significant variability in terminology making comparisons difficult. Process indicators were more common than outcome indicators (63.5% vs 46.0%). There is a lack of attention to participant characteristics with 29.4% capturing participant characteristics such as race, income, and neighborhood. Negative consequences from program participation and program efficiency were infrequently considered (9.3% and 13.7%). Conclusion Contextual factors, negative outcomes, the use of evaluation frameworks, and measures of program sustainability would strengthen evaluations and provide an evidence-base for physical activity programming, policy, and funding.

· 150-300 minutes of moderate-intensity aerobic physical activity per week or · At least 75-150 minutes of vigorous intensity aerobic physical activity per week or · An equivalent combination of moderate-and vigorousintensity activity throughout the week Additionally, WHO 6 suggests that children and youth should obtain 60 minutes of moderate-to-vigorous intensity physical activity per day. Only 18% of Canadian adults and 9.5% of Canadian children and youth are meeting physical activity guidelines. 3 A program evaluation is "the systemic collection of information about the activities, characteristics and outcomes of a program to make judgements about the program, improve program effectiveness, and/or inform decisions about future program development." 10 Evaluations are important tools for program improvement while also exerting influence on policy and funding streams, building community capacity, and facilitating information sharing between communities and programs. 11,12 As described by US Department of Health and Human Services (HHS), 12 the type of evaluation used is dependent upon the purpose of the evaluation and when it is conducted within the program's life cycle. Implementation/ process evaluations assess the inputs and activities of a program (e.g., is the program being delivered as planned, what are the external influences, and is the program within time and resource capacity). Effectiveness/outcome evaluations measure the short-term, intermediate, or long-term effect(s) of the program (e.g., what was accomplished, is the program effective, and were there any unintended effects). 11,12 There are national and global frameworks to support the development of physical activity initiatives and guide their evaluation. In 2006, the WHO developed the Global Strategy on Diet, Physical Activity, and Health: A Framework to Monitor and Evaluate Implementation. 13 The objective of this strategic approach was to provide a framework and indicators that could be used in a physical activity program evaluation. 13 Additionally, in 2011 the USA Centre for Disease Control and Prevention released a guide for evaluation in public health. 12 In 2012, stakeholders of the pan-Canadian physical activity collaboration developed Active Canada 20/20, which provides a local, regional, provincial/territorial, and national framework for physical activity promotion. 14 Active Canada 20/20 advocates for the adoption of population-based strategies, with specific attention to population sub-groups facing the greatest barriers to physical activity. 15 This approach is supported by evidence that program participation and rates of physical activity are impacted by characteristics such as race, income, migrant status, and neighborhood factors. 5,11,[13][14][15][16] The first research action of the European Union's Joint Programming initiative, the Determinants of Diet and Physical Activity (DEDIPAC) Knowledge Hub, conducted an umbrella review in 2015 to identify "Good Practice Characteristics" to assist in monitoring and evaluating interventions and policies that promote a healthy diet, increase participation in physical activity, and reduce sedentary behaviors. 17 The Good Practice Characteristics addressed costs, outcomes, measurements, and process evaluation aspects. 17 Together WHO, HHS, Active Canada 20/ 20, and DEDIPAC suggest a strategy to meet the diversity and complexity required to achieve sustainable behavior change as reflected in the Public Health Agency of Canada vision. 3 To further this work, Kosowan et al 18 assessed strengths, challenges, and opportunities in currently implemented physical activity strategies. One of the challenges that emerged was the need for guidance to develop and resource program evaluations that can inform physical activity strategies by highlighting current approaches as well as gaps and areas for improvement. 18 This program evaluation challenge highlights several potentially relevant frameworks, indicators, and tools but also questions-to what extent are these frameworks and best practices reflected in published program evaluations? We therefore conducted this scoping review to systematically map and describe evaluations of physical activity programs in North America.

Aim
This review aimed to analyze review articles to systematically map and describe physical activity program evaluations published between January 2014 and July 2020 to summarize key characteristics of the published literature and suggest opportunities to strengthen current evaluations. This review identified the presence of key characteristics outlined in national, Active Canada 20/20,14,15 and international, WHO, 6,13 program evaluation frameworks. We targeted reviews conducted between 2014 and 2020, following the release of tools aimed at tailoring evaluations frameworks to the local context. 15 In 2014, a national summit in Canada created, "Pathways to Wellbeing Framework for Recreation in Canada," producing a plan and commitment to action developed by physical activity stakeholders to address inactivity by 2020. 15 This Canadian plan references a similar plan developed for the United States of America, 19 and was informed by international strategies to address physical inactivity. 13 We provide direction for future evaluations by highlighting gaps in the literature and opportunities to strengthen current approaches to physical activity evaluation and monitoring.
A scoping review provides an overview of extent, range, and nature of research activity available on a given topic. 20 Scoping reviews are well suited for understanding gaps in the research area of interest. Scoping reviews can range from a rapid review of key concepts and articles in the area, to a comprehensive review of the topics. 20,21 The Preferred Reporting Items for Systematic reviews and Meta-Analyse (PRISMA) extension for scoping reviews checklist (Appendix C) provides assurance that this scoping review details essential items pertinent to describing evaluations on physical activity programs. 22 Using scoping review methodology to examine reviews of physical activity evaluations in North America since 2014, we provide a comprehensive description of evaluation frameworks, indicators, and measures to guide future program evaluations.

Data Sources
This review followed a protocol prepared by Goertzen and colleagues (2015) 23 using methods similar to previous review articles completed by our research team. [24][25][26][27] We conducted a systematic search of the following electronic databases: Medline, Scopus, Sportdiscus, Eric, PsycInfo, and CINAHL led by a health sciences librarian. The team developed a search strategy using controlled vocabulary and keywords to describe physical activity evaluations derived from WHO, Action Canada 20/20, and the DEDIPAC. Searches were performed in October 2018, with an updated systematic search occurring in July 2020. The search strategy is outlined in Appendix A Table A1

Inclusion and Exclusion Criteria
We used a two-stage screening process. First, three reviewers screened the titles of all included reviews in Rayyan, an online application to assist with systematic reviews. 28 Inclusion criteria included: review article, English language, human subjects, primary prevention focus, physical activity evaluation, and evaluations conducted in North America (which includes Canada, the United States of America, and Mexico) (Appendix A  Table A1). Review articles considered for inclusion were systematic review, meta-analysis, meta-synthesis, scoping review, narrative review, rapid review, critical review, and integrative review. Any discrepancies were discussed, and when necessary resolved by a fourth reviewer. Following the screening of the article titles, articles included for abstract review were downloaded from Rayyan into an excel spreadsheet. Two researchers screened the abstracts of the remaining articles for inclusion based on the pre-determined criteria. If sufficient detail was not available in the abstract, the full text of the article was reviewed to determine eligibility for inclusion in the scoping review. Our systematic search yielded 3193 articles. After removing duplicate articles, we screened 2675 articles based on the inclusion/exclusion criteria. Following title and abstract screening there were 211 review articles (Appendix B Table A2) that met the inclusion criteria for the scoping review (PRISMA Flow Diagram provided in Figure 1) (Table A3).

Data Extraction
Data was extracted from the abstract of each review article by two reviewers. Review articles described physical activity evaluations in North America. This description of the evaluation, as presented by the review article, was used to complete the data charting form. The data charting form was designed by the team and included the following: (1) General detail (i.e., author, the type of review, number of studies included in the review, reported timeframe, location, and review objective/aim) (2) Evaluation focus derived from the WHO Global Strategy on Diet, Physical Activity, and Health: A Framework to monitor and evaluate implementation 6,13 and the Action Canada 20/20 Framework 14,15 : · type of evaluation (implementation/process, output, short-term, intermediate, or long-term outcome), · focus of the indicators: context (social inequity, disease burden, media, and built environment), settings (community, school, workplace, and media), and evidence (3) Good practice characteristics for monitoring and evaluating a physical activity program as defined by DEDIPAC 17 (i.e., costs considered (health benefit, behavior changes, intervention, policy, and economic), outcomes measured (physical, psychological, and both), effectiveness or efficiency sustainability, effect, reach, participant characteristics and generalizability, underlying processes, and active components) (4) Evaluation framework, theory, and evaluation indicators. To sufficiently capture these areas the full text of each review article was assessed to complete the following columns in our data charting form: evaluation framework, theory, strategy, and what was measured. In addition to the theory name, when available reviewers documented if the theory guided the evaluation or program being evaluated.

Data Synthesis
Two reviewers screened the titles and abstracts, extracted all data into the data extraction form, and reviewed and discussed all discrepancies to reach consensus. Descriptive statistics were used to summarize the categories within the data extraction form. Additionally, the two reviewers summarized narrative examples that could inform evaluation strategies. All authors reviewed and discussed preliminary results to reach consensus on the key findings.

Results
There were 211 review articles published between January 2014 and July 2020 that collectively reviewed 8138 physical activity program evaluations. On average, there were 32 physical activity evaluation review articles each year. There were 5 physical activity evaluation review articles that focused exclusively on Canada, [29][30][31][32][33] 90 articles that focused exclusively on the USA, and 116 articles that included studies from both Canada and the USA.

Consideration When Conceptualizing and Developing an Evaluation Plan
Scientific Evidence. There were a number of different evaluation frameworks and theoretical approaches discussed within the included reviews. Stanhope et al 34 point out that there are a variety of strong evaluation methods and measures; the decision on the approach to use should be tailored to the population and setting, and be based on the strengths and limitations of each approach. However, in the literature the terms "evaluation framework," "theory," and "strategy" were defined differently making comparisons between approaches difficult. For example, social cognitive theory was defined as a framework, a theoretical approach and a strategy among the included reviews (e.g.,  For the purpose of this scoping review, we followed the US Department of Health and Human Services (HHS) definitions: (1) an evaluation framework is a guide to summarize and organize essential elements of program evaluation; (2) a theory is a set of beliefs used to understand change; and (3) a strategy is a method used to gather evidence. 12 Evaluation Frameworks. Very few evaluations of physical activity programs described the use of a formal evaluation framework. This is not surprising as the majority of reviews were of randomized controlled trials that rarely include evaluation frameworks. There were a small number of frameworks used by physical activity evaluations, with Re-AIM being the most commonly referenced framework. [38][39][40][41][42][43][44][45][46][47][48][49][50][51] Reviews reported that the RE-AIM framework was used to promote consistent reporting of intervention results from health promotion and disease management interventions by addressing multiple dimensions (populations, settings, and health conditions) and informing internal and external validity. Almost all of the articles in our review reported that evaluations focused on internal factors such as program effectiveness (95.7%), with external factors such as cost-effectiveness of the program infrequently considered (13.7%).
Theory. Fifty-nine of 211 (28%) reviews reported that evaluations included in their study referred to a specific theory that either informed the program evaluation or the program being evaluated. For example, authors mentioned that the socioecological model was used to assess the presence of each system in evaluation measures 52 or was used to design the program to attend to each of the socio-ecological systems defined in the model. 53 There were 26 different theoretical approaches mentioned by the reviews of physical activity evaluations; the most common were social cognitive theory, transtheoretical model/stages of change, socio-ecological model, theory of planned behavior, and the health belief model (Table 1). Universal and targeted approaches: populations and equity. Physical activity program evaluation reviews were primarily focused on individual adults (37.4%) and children and youth (35.5%). Population subgroups defined by ethnicity, disability, gender, and low-income were identified in 28.4% of reviews ( Figure 2). Without capturing characteristics of the program participants, the significance of an evaluation's findings to different groups cannot be determined.

Evaluation Indicators
Evaluation indicators are specific, observable, and measurement statements to measure the program process and/or outcome. 12 Process Indicators. Program delivery models each have their own unique strengths and limitations that must be considered for program implementation and evaluation. 54 Process indicators describe and evaluate how an intervention was delivered. 12,13 Process evaluations typically used the following measurement techniques to assess the program: · Physical education observation (e.g., duration of strength training components, minutes of physical activity in the lesson, and interactions between the participants and instructor) · Self-reported program adherence by participants (e.g., completion of home exercises) · Program schedules (e.g., amount and type of physical activity) · Program records (e.g., participation rate and retention rate) · Program descriptors (length, priority population, setting, and type) Process indicators were used in the majority (63.5%) of the included reviews. Process indicators included measures to assess program delivery, resource utilization, and external influences. For example, 38.8% of reviews reported that process indicators were able to identify aspects of a program model or program context (e.g., location and time, clinician proficiency, parental involvement, and peer involvement) that could inform future approaches to increase physical activity participation. Process indicators were used to assess fidelity of program implementation (66.4%) and the dose-response relationship (e.g., minutes of physical activity delivered vs received) was mentioned in 18.7% of reviews. Indicators of program reach were focused on uptake and adoption (32.0%), retention and adherence (30.6%), and access and engagement (20.9%).
When considering the focus of the process indicators, 13,14 the majority (60.0%) of indicators in their review considered social inequity, while media-focused strategies (20.0%), disease burden (12.0%), or built environment (8.0%) were less prominent ( Figure 3). Among articles that considered social inequity, ethnicity was the most common factor considered. Acknowledging the role of ethnicity in program development and implementation can influence program uptake as well as the ability of the program to retain participants and increase physical activity levels. 5,87 Child/youth populations were the focus of 41.0% (55/134) of reviews of evaluations with process indicators; 26.9% (36/ 134) aimed to develop school-based physical activity programs and policies ( Figure 2). For example, McKenzie and Smith 88 describe the use of a System for Observing Fitness Instruction Time (SOFIT) in diverse settings to develop measures for assessing variations in programs developed for physical education. The measures include program structure/ setting, teacher behaviors, and student characteristics. 88 The built environment was mentioned by both the WHO and Active Canada 20/20; however, it was infrequently assessed in evaluations. Review articles that did assess the built environment found that commonly reported instruments used for measurement included geographical information systems (GIS), global positioning system (GPS), and neighborhood assessment. 89-91 These instruments were used to assess characteristics of the built environment such as green space, accessibility of buildings, and walkability of the neighborhood. For example, McGrath et al reviewed evaluations that used GIS, GPS, and neighborhood assessments to measures the number of minutes of physical  [35][36][37]43,[46][47][48] Transtheoretical model/stages of change [35][36][37]47,48,51,52,55,56,61,62,66,68,72,75,79,80 Socio-ecological model 47,[51][52][53][57][58][59]66,67,70,71,77,[80][81][82][83][84] Theory of planned behavior 37,47,48,51,54,59,66,67,69,70,85 Health belief model [68][69][70]72,76,77,86  activity compared to number of meters to the closest neighborhood park and housing density per square kilometer. 89 Outcome Indicators. Outcome indicators measure effects or changes resulting from the program and can be grouped into output (e.g., direct product of the activity), short-term outcomes (e.g., increased knowledge), intermediate outcomes (e.g., behavior change), or long-term outcomes (e.g., disease prevention and management). 12,13 Output indicators focus on the immediate effect or product that results from the intervention. 11,12 A majority (56.7%, 76/ 134) of the reviews that assessed evaluations with process indicators also included evaluations with output indicators. For example, one review describes the relationship between implementation processes (e.g., after-school, summer, and multiple times a day) and program outputs (including minutes of physical activity, fruit and vegetable consumption, and caloric intake) as found within 28 different program evaluations. 40 Among reviews of evaluations with output indicators, 21.4% measured outputs such as self-reported increases in physical activity, number of people reached, number of sessions attended, and knowledge and attitudes towards behavior change ( Figure 3). Some reviews also considered structures such as the built environment, and its influence on access to, and benefits gained from, physical activity programs (42.9%). For example, Calder et al 92 explored evaluations that measured access of public indoor fitness centers by people with disabilities. This included researcher observations of the setting such as physical requirements needed to access equipment and bathrooms, as well as program availability, policies, and professionalism of staff. 92 Societal level application of lessons learned from physical activity programs was the focus of 17.3% of reviews of evaluations with output indicators ( Figure 2). For example, Hunter et al 93 suggest the need for more urban green space areas following their review of physical activity program evaluations that demonstrated an increase in physical activity if the program encouraged physical activity (PA) in urban green space. 93 Short-term, intermediate, and long-term outcome indicators measure the consequences from participating in the program and are typically measured months or years after the program. Disease burden was a large focus among these reviews (40.0%), largely aimed at determining long-term solutions for prevention of health conditions ( Figure 3).
Almost half of the reviews in this review (46.0%, 97/211) included evaluations with outcome indicators. Intermediate outcomes were more likely to be assessed by review articles (70.1%, 68/97) compared to short-term (21.6%, 21/97) and long-term (35.1%, 34/97) outcomes. Evaluation guides, such as best practice guidelines for accelerometer use, have been developed to guide assessment of physical activity outcome measures. 94 These guides aim to standardize how the instrument is used (e.g., location and time the device is worn) as well as the number of steps that represent being physically active.
Outcome measures can be categorized into the following groups: anthropometric, behavioral, or psychological ( Table  2). Most evaluations measured physical outcomes (i.e., anthropometric and behavioral changes) following participation in a physical activity program (86.6%), with fewer evaluations assessing psychological or cognitive outcomes (36.1%). Arbour-Nicitopoulos et al 85 reported that evaluations they reviewed used both physical (i.e., physical skill development) and psychological (i.e., psychological wellbeing) measures of outcomes for children and youth who participated in out-ofschool physical activity programs. Psychological measures were more evident in evaluations focused on children and youth, older adults, or ethnic subpopulations. Some reviews also reported that evaluations used self-report (i.e., survey) measures for social support and relationships such as the CAR-DIA-2, 11 items or the Perceived Social Support Scale. 55 Many of the reviews discussed evaluations with outcome indicators focused on determining if a physical activity program or strategy that was previously found to be effective would be effective in different population subgroups related to gender, disability, health conditions, or ethnicity/ culture (40.0%). 55,87,95,96 Table 2 provides a more detailed outline of the kinds of measures used in each categorical group. Finally, an extremely small proportion of reviews considered negative consequences of program participation (9.3%) despite evidence that negative consequences, including adverse events, physical injuries and falls, and worsening subjective wellbeing are important for designing programs and initiatives. 17

Key Findings to Inform an Evaluation
This review of reviews found a wide variety of evaluation frameworks, theoretical underpinnings, program strategies, and evaluation measures used for different programs and settings. Variability, not only in the frameworks, theories, strategies, and measures, but also in how they were applied to each evaluation makes comparison between programs challenging, and may obscure emerging best practices. Consistency in defining the terms "evaluation framework," "theory," and "strategy" would support the use of evaluation findings in the development of new programs or improvement of current programs (Canadian Institute for Health Information). 12,35,96 Authors should describe key aspects of their evaluation including use of frameworks, theories, strategies, and measures, define evaluation terms, and provide details of the program context to inform future evaluations and program development. 12 High quality evaluations are commonly based on a specific framework that increases the likelihood the evaluation is developed appropriately and comprehensively to meet the specific identified needs. 12 Three quarters of the reviews did not identify any specific evaluation frameworks. An evaluation framework promotes consistency in measurement (e.g., SOFIT) or reporting (e.g., RE-AIM) within program evaluations. In addition to utilizing an evaluation framework, incorporation of a theoretical approach can be important to ensure attention to factors that influence physical activity participation and may impact program effectiveness. 35 Future research should assess and compare the application of evaluation frameworks and theories within the published evaluation literature to inform approaches for future evaluations.
The characteristics of the population of interest such as race, income, and geography should be considered as they impact program participation. 5,6,[11][12][13][14][15] Populations from lower socioeconomic groups are less physically active compared to those from higher socioeconomic groups. 5 Although over half of the reviews included equity as an important consideration for primary prevention, only 30% of the articles in this scoping review considered participant characteristics and equitable distribution of physical activity strategies. Thus, understanding the characteristics of the at-risk population a program is designed to address is important, independent of the type of evaluation. In contrast to this, more than three-quarters of the reviews in this scoping review reported on the effect size of interventions; however, we would argue that contextual data is necessary to understand not only whether a particular program or intervention "works" but for whom it works and under what conditions. Equity considerations and targeted programming can address the challenges of participating in physical activity among populations least likely to be active or more likely to experience barriers to physical activity participation. 5,14 The application of an equity lens including socioeconomic status and sociocultural aspects, such as gender, ethnicity, religion, culture, migrant status, neighborhood characteristics, and social capital, can inform population health interventions. 87 Describing the population informs the usefulness, feasibility, fairness, and accuracy of an evaluation plan. 12 Evaluation measures should include both specific measures of physical activity and sociodemographic, cultural, economic, political, and geographic factors that impact participation. 5,12 Strengths and Opportunities in the Evaluation Literature Process Indicators. Reviews of evaluations with process indicators proposed future directions for physical activity programs (i.e., strategies and policies) and evaluation tools (i.e., frameworks and standardized instruments) to provide information that can enable the development of context-specific strategies. According to the Introduction to Program Evaluation for Public Health Programs new programs should use an implementation/process evaluation to assess program implementation and/or examine contextual factors that could affect program activities. 12 The evaluation may include some output or short-term outcome indicators. 6,13 Over half of the articles in this review included process indicators and many of these also included output indicators. Together these process indicators suggest the use of standardized methods and a combination of observation, self-reported measures and document review to evaluate programs and inform strategies that can be implemented at a societal level.
Active Canada 20/20 14,15 and WHO 5 emphasize the importance of the built environment on physical activity participation. However, the built environment and social infrastructure were infrequently considered within physical activity evaluations. Measuring the influence of the built environment can inform city planning, policy, and physical activity funding streams. Although some tools have been suggested for measurement of the built environment on PA participation, future research should continue to develop these tools. 90 Outcome Indicators. An outcome evaluation can be conducted as soon as the desired outcome(s) can be expected to have occurred (e.g., accelerometer measurements at 6 months or longer can predict the intermediate and long-term outcomes of the program). 12,13,56,57 However, similar to Ling et al, we found that many reviews conducted evaluations within 6 months of the intervention, and few evaluated long-term outcomes of their program. 57 Several organizations posited that evaluations should consider both physical and psychological outcomes, and include possible positive and negative consequences of participating. 12,13,15,85 However, reviews in this scoping review focused largely on physical outcomes (i.e., anthropometric measures and amount of physical activity) with very few considering psychological outcomes and negative consequences of participation. Similar to McGoey et al, 46 almost all of the reviews of evaluations in our review focused solely on determining if a program effectively increased physical activity without considering cost-effectiveness, effect size, or generalizability. 46

Strengths and Limitations of This Review
This scoping review provides a comprehensive review of physical activity evaluation literature in North America.
Informed by current guidelines (e.g., WHO, Active Canada 20/20, and HHS) our review systematically searched for peerreviewed published review articles that summarized physical activity evaluations. Peer-reviewed literature has been assessed by experts in the field for quality and completeness.
This scoping review is a "review of reviews," and each review included a number of physical activity evaluations. The reviews we included may not have fully detailed all key elements in the evaluation designs they examined, and so we were limited to what was reported in the reviews. Further examination of the original evaluation articles may provide additional details on the evaluation framework used to assess the physical activity program.
To obtain a larger breadth of data, the data extraction process focused on obtaining study details through abstract review. This may also have affected the depth of detail obtained from each study. However, the full article of each review was assessed for evaluation framework(s), theories, and measures to ensure there were minimal gaps in the presented information on these topics.

Conclusions
Comprehensive evaluation designs support physical activity program improvement as well as the development and expansion of well-designed programs and strategies. This scoping review provides a comprehensive overview of how the frameworks, theories, strategies, indicators, and available measures and tools have been utilized in physical activity evaluation in North America. Based on the findings in this review, the creation of a plain language practice-based guide might contribute to greater use of and more robust physical activity evaluations. Capturing participant characteristics within evaluation literature would also help inform universal and targeted approaches for physical activity promotion. Future reviews should be sure to include precise descriptions of the guiding theory, frameworks, strategy, and indicators used in any specific program to add clarity and make the effectiveness of said program easier to quantify. Contextual factors, positive and negative outputs/outcomes, the use of evaluation frameworks, and measures of program sustainability can further inform future evaluations, which in turn provide an evidence-base for physical activity programming, policy, and funding.

So What? Implications for Health Promotion Practitioners and Research
What is Already Known on This Topic?. Physical inactivity is pervasive and negatively impacts health. Multiple programs attempt to increase physical activity. Robust program evaluations can identify effective promotion strategies; however, it is unclear to what extent existing evaluation frameworks are being applied.
What Does This Article Add?. This scoping review of reviews systematically maps and describes evaluations of physical activity programs to summarize key characteristics of the published literature and suggest opportunities to strengthen current evaluations.
What Are the Implications for Health Promotion Practice and Research?. We describe review characteristics, evaluation measures and "good practice characteristics" to inform evaluation strategies. This review defines current terminology and describes the frameworks and measures that have been applied in physical activity evaluations. Contextual factors, negative outcomes, and measures of program sustainability would strengthen future evaluations and provide an evidencebase for physical activity programming, policy, and funding.

Acknowledgments
The team would like to acknowledge Brittney Semenchuk for her assistance with title review and Alyssa Kidd for her assistance in documenting location of the evaluations within each eligible review. Their assistance provided details for the study team to accurately assess the eligibility of each review for this scoping review.

Author Contributions
LK, JR, GH, JE, MH, PW, LG, and AK made substantial contribution to the concept and design of this research. LK and JR contributed to data acquisition. LK, SS, and AK contributed to the analyses. All authors assisted with interpretation of the data. LK and SS drafted the article. All authors reviewed and revised the article and approve this version to be published.

Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article: This research was funded by a primary prevention research chair awarded to Alan Katz sponsored by Heart and Stroke Foundation and Research Manitoba.  • Published in English between January 2014 and July 2020 • Journal articles that are not rigorous reviews (i.e., outside of those defined in the inclusion list), such as book reviews, opinion articles, commentaries, or editorial reviews • Human subjects of all age groups • Primary prevention: Research that targets the general population and only randomly includes individual with illness, disease, or conditions • Targeting treatment of a specific disease, illness or condition • Review articles: Systematic review, meta-analysis, meta-synthesis, scoping review, narrative review, rapid review, critical review, and integrative review • Focused on physical activity program evaluation • Reviews with less than 50% of the articles located in North America • Research in North America Table A2.  (1) Gao, Z., Chen, S., Pasco, D., & Pope, Z. (2015). A meta-analysis of active video games on health outcomes among children and adolescents. Obesity Reviews, 16 (9)   Eligibility criteria 6 Specify characteristics of the sources of evidence used as eligibility criteria (e.g., years considered, language, and publication status), and provide a rationale Appendix A Information sources a 7 Describe all information sources in the search (e.g., databases with dates of coverage and contact with authors to identify additional sources), as well as the date the most recent search was executed Page 8-9 Search 8 Present the full electronic search strategy for at least one database, including any limits used, such that it could be repeated Appendix A Selection of sources of evidence b 9 State the process for selecting sources of evidence (i.e., screening and eligibility) included in the scoping review Page 8-9 Data charting process c 10 Describe the methods of charting data from the included sources of evidence (e.g., calibrated forms or forms that have been tested by the team before their use, and whether data charting was done independently or in duplicate) and any processes for obtaining and confirming data from investigators Page 9 Data items 11 List and define all variables for which data were sought and any assumptions and simplifications made Page 9 Critical appraisal of individual sources of evidence d 12 If done, provide a rationale for conducting a critical appraisal of included sources of evidence; describe the methods used and how this information was used in any data synthesis (if appropriate) N/A Synthesis of results 13 Describe the methods of handling and summarizing the data that were charted Page 9 Results Selection of sources of evidence 14 Give numbers of sources of evidence screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally using a flow diagram Synthesis of results 18 Summarize and/or present the charting results as they relate to the review questions and objectives